37 research outputs found

    Near-optimal experimental design for model selection in systems biology

    Get PDF
    Motivation: Biological systems are understood through iterations of modeling and experimentation. Not all experiments, however, are equally valuable for predictive modeling. This study introduces an efficient method for experimental design aimed at selecting dynamical models from data. Motivated by biological applications, the method enables the design of crucial experiments: it determines a highly informative selection of measurement readouts and time points. Results: We demonstrate formal guarantees of design efficiency on the basis of previous results. By reducing our task to the setting of graphical models, we prove that the method finds a near-optimal design selection with a polynomial number of evaluations. Moreover, the method exhibits the best polynomial-complexity constant approximation factor, unless P = NP. We measure the performance of the method in comparison with established alternatives, such as ensemble non-centrality, on example models of different complexity. Efficient design accelerates the loop between modeling and experimentation: it enables the inference of complex mechanisms, such as those controlling central metabolic operation. Availability: Toolbox ‘NearOED' available with source code under GPL on the Machine Learning Open Source Software Web site (mloss.org). Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin

    A combined model reduction algorithm for controlled biochemical systems

    Get PDF
    Background: Systems Biology continues to produce increasingly large models of complex biochemical reaction networks. In applications requiring, for example, parameter estimation, the use of agent-based modelling approaches, or real-time simulation, this growing model complexity can present a significant hurdle. Often, however, not all portions of a model are of equal interest in a given setting. In such situations methods of model reduction offer one possible approach for addressing the issue of complexity by seeking to eliminate those portions of a pathway that can be shown to have the least effect upon the properties of interest. Methods: In this paper a model reduction algorithm bringing together the complementary aspects of proper lumping and empirical balanced truncation is presented. Additional contributions include the development of a criterion for the selection of state-variable elimination via conservation analysis and use of an ‘averaged’ lumping inverse. This combined algorithm is highly automatable and of particular applicability in the context of ‘controlled’ biochemical networks. Results: The algorithm is demonstrated here via application to two examples; an 11 dimensional model of bacterial chemotaxis in Escherichia coli and a 99 dimensional model of extracellular regulatory kinase activation (ERK) mediated via the epidermal growth factor (EGF) and nerve growth factor (NGF) receptor pathways. In the case of the chemotaxis model the algorithm was able to reduce the model to 2 state-variables producing a maximal relative error between the dynamics of the original and reduced models of only 2.8% whilst yielding a 26 fold speed up in simulation time. For the ERK activation model the algorithm was able to reduce the system to 7 state-variables, incurring a maximal relative error of 4.8%, and producing an approximately 10 fold speed up in the rate of simulation. Indices of controllability and observability are additionally developed and demonstrated throughout the paper. These provide insight into the relative importance of individual reactants in mediating a biochemical system’s input-output response even for highly complex networks. Conclusions: Through application, this paper demonstrates that combined model reduction methods can produce a significant simplification of complex Systems Biology models whilst retaining a high degree of predictive accuracy. In particular, it is shown that by combining the methods of proper lumping and empirical balanced truncation it is often possible to produce more accurate reductions than can be obtained by the use of either method in isolation

    Mathematical and Statistical Techniques for Systems Medicine: The Wnt Signaling Pathway as a Case Study

    Full text link
    The last decade has seen an explosion in models that describe phenomena in systems medicine. Such models are especially useful for studying signaling pathways, such as the Wnt pathway. In this chapter we use the Wnt pathway to showcase current mathematical and statistical techniques that enable modelers to gain insight into (models of) gene regulation, and generate testable predictions. We introduce a range of modeling frameworks, but focus on ordinary differential equation (ODE) models since they remain the most widely used approach in systems biology and medicine and continue to offer great potential. We present methods for the analysis of a single model, comprising applications of standard dynamical systems approaches such as nondimensionalization, steady state, asymptotic and sensitivity analysis, and more recent statistical and algebraic approaches to compare models with data. We present parameter estimation and model comparison techniques, focusing on Bayesian analysis and coplanarity via algebraic geometry. Our intention is that this (non exhaustive) review may serve as a useful starting point for the analysis of models in systems medicine.Comment: Submitted to 'Systems Medicine' as a book chapte

    Methods of model reduction for large-scale biological systems: a survey of current methods and trends

    Get PDF
    Complex models of biochemical reaction systems have become increasingly common in the systems biology literature. The complexity of such models can present a number of obstacles for their practical use, often making problems difficult to intuit or computationally intractable. Methods of model reduction can be employed to alleviate the issue of complexity by seeking to eliminate those portions of a reaction network that have little or no effect upon the outcomes of interest, hence yielding simplified systems that retain an accurate predictive capacity. This review paper seeks to provide a brief overview of a range of such methods and their application in the context of biochemical reaction network models. To achieve this, we provide a brief mathematical account of the main methods including timescale exploitation approaches, reduction via sensitivity analysis, optimisation methods, lumping, and singular value decomposition-based approaches. Methods are reviewed in the context of large-scale systems biology type models, and future areas of research are briefly discussed

    Complex genetic patterns in human arise from a simple range-expansion model over continental landmasses

    Get PDF
    © 2018 Kanitz et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Although it is generally accepted that geography is a major factor shaping human genetic differentiation, it is still disputed how much of this differentiation is a result of a simple process of isolation-by-distance, and if there are factors generating distinct clusters of genetic similarity. We address this question using a geographically explicit simulation framework coupled with an Approximate Bayesian Computation approach. Based on six simple summary statistics only, we estimated the most probable demographic parameters that shaped modern human evolution under an isolation by distance scenario, and found these were the following: an initial population in East Africa spread and grew from 4000 individuals to 5.7 million in about 132 000 years. Subsequent simulations with these estimates followed by cluster analyses produced results nearly identical to those obtained in real data. Thus, a simple diffusion model from East Africa explains a large portion of the genetic diversity patterns observed in modern humans. We argue that a model of isolation by distance along the continental landmasses might be the relevant null model to use when investigating selective effects in humans and probably many other species

    A framework for parameter estimation and model selection from experimental data in systems biology using approximate Bayesian computation.

    Get PDF
    As modeling becomes a more widespread practice in the life sciences and biomedical sciences, researchers need reliable tools to calibrate models against ever more complex and detailed data. Here we present an approximate Bayesian computation (ABC) framework and software environment, ABC-SysBio, which is a Python package that runs on Linux and Mac OS X systems and that enables parameter estimation and model selection in the Bayesian formalism by using sequential Monte Carlo (SMC) approaches. We outline the underlying rationale, discuss the computational and practical issues and provide detailed guidance as to how the important tasks of parameter inference and model selection can be performed in practice. Unlike other available packages, ABC-SysBio is highly suited for investigating, in particular, the challenging problem of fitting stochastic models to data. In order to demonstrate the use of ABC-SysBio, in this protocol we postulate the existence of an imaginary reaction network composed of seven interrelated biological reactions (involving a specific mRNA, the protein it encodes and a post-translationally modified version of the protein), a network that is defined by two files containing 'observed' data that we provide as supplementary information. In the first part of the PROCEDURE, ABC-SysBio is used to infer the parameters of this system, whereas in the second part we use ABC-SysBio's relevant functionality to discriminate between two different reaction network models, one of them being the 'true' one. Although computationally expensive, the additional insights gained in the Bayesian formalism more than make up for this cost, especially in complex problems

    Importance of incomplete lineage sorting and introgression in the origin of shared genetic variation between two closely related pines with overlapping distributions

    Get PDF
    Genetic variation shared between closely related species may be due to retention of ancestral polymorphisms because of incomplete lineage sorting (ILS) and/or introgression following secondary contact. It is challenging to distinguish ILS and introgression because they generate similar patterns of shared genetic diversity, but this is nonetheless essential for inferring accurately the history of species with overlapping distributions. To address this issue, we sequenced 33 independent intron loci across the genome of two closely related pine species (Pinus massoniana Lamb. and Pinus hwangshanensis Hisa) from Southeast China. Population structure analyses revealed that the species showed slightly more admixture in parapatric populations than in allopatric populations. Levels of interspecific differentiation were lower in parapatry than in allopatry. Approximate Bayesian computation suggested that the most likely speciation scenario explaining this pattern was a long period of isolation followed by a secondary contact. Ecological niche modeling suggested that a gradual range expansion of P. hwangshanensis during the Pleistocene climatic oscillations could have been the cause of the overlap. Our study therefore suggests that secondary introgression, rather than ILS, explains most of the shared nuclear genomic variation between these two species and demonstrates the complementarity of population genetics and ecological niche modeling in understanding gene flow history. Finally, we discuss the importance of contrasting results from markers with different dynamics of migration, namely nuclear, chloroplast and mitochondrial DNA

    Model selection in historical research using approximate Bayesian computation

    Get PDF
    Formal Models and History Computational models are increasingly being used to study historical dynamics. This new trend, which could be named Model-Based History, makes use of recently published datasets and innovative quantitative methods to improve our understanding of past societies based on their written sources. The extensive use of formal models allows historians to reevaluate hypotheses formulated decades ago and still subject to debate due to the lack of an adequate quantitative framework. The initiative has the potential to transform the discipline if it solves the challenges posed by the study of historical dynamics. These difficulties are based on the complexities of modelling social interaction, and the methodological issues raised by the evaluation of formal models against data with low sample size, high variance and strong fragmentation. This work examines an alternate approach to this evaluation based on a Bayesian-inspired model selection method. The validity of the classical Lanchester's laws of combat is examined against a dataset comprising over a thousand battles spanning 300 years. Four variations of the basic equations are discussed, including the three most common formulations (linear, squared, and logarithmic) and a new variant introducing fatigue. Approximate Bayesian Computation is then used to infer both parameter values and model selection via Bayes Factors. Results indicate decisive evidence favouring the new fatigue model. The interpretation of both parameter estimations and model selection provides new insights into the factors guiding the evolution of warfare. At a methodological level, the case study shows how model selection methods can be used to guide historical research through the comparison between existing hypotheses and empirical evidence.Funding for this work was provided by the SimulPast Consolider Ingenio project (CSD2010-00034) of the former Ministry for Science and Innovation of the Spanish Government and the European Research Council Advanced Grant EPNet (340828).Peer ReviewedPostprint (published version

    Quantifying the roles of host movement and vector dispersal in the transmission of vector-borne diseases of livestock

    Get PDF
    The role of host movement in the spread of vector-borne diseases of livestock has been little studied. Here we develop a mathematical framework that allows us to disentangle and quantify the roles of vector dispersal and livestock movement in transmission between farms. We apply this framework to outbreaks of bluetongue virus (BTV) and Schmallenberg virus (SBV) in Great Britain, both of which are spread by Culicoides biting midges and have recently emerged in northern Europe. For BTV we estimate parameters by fitting the model to outbreak data using approximate Bayesian computation, while for SBV we use previously derived estimates. We find that around 90% of transmission of BTV between farms is a result of vector dispersal, while for SBV this proportion is 98%. This difference is a consequence of higher vector competence and shorter duration of viraemia for SBV compared with BTV. For both viruses we estimate that the mean number of secondary infections per infected farm is greater than one for vector dispersal, but below one for livestock movements. Although livestock movements account for a small proportion of transmission and cannot sustain an outbreak on their own, they play an important role in establishing new foci of infection. However, the impact of restricting livestock movements on the spread of both viruses depends critically on assumptions made about the distances over which vector dispersal occurs. If vector dispersal occurs primarily at a local scale (99% of transmission occurs <25 km), movement restrictions are predicted to be effective at reducing spread, but if dispersal occurs frequently over longer distances (99% of transmission occurs <50 km) they are not

    Near-optimal experimental design for model selection in systems biology

    Get PDF
    Motivation: Biological systems are understood through iterations of modeling and experimentation. Not all experiments, however, are equally valuable for predictive modeling. This study introduces an efficient method for experimental design aimed at selecting dynamical models from data. Motivated by biological applications, the method enables the design of crucial experiments: it determines a highly informative selection of measurement readouts and time points. Results: We demonstrate formal guarantees of design efficiency on the basis of previous results. By reducing our task to the setting of graphical models, we prove that the method finds a near-optimal design selection with a polynomial number of evaluations. Moreover, the method exhibits the best polynomial-complexity constant approximation factor, unless P = NP. We measure the performance of the method in comparison with established alternatives, such as ensemble non-centrality, on example models of different complexity. Efficient design accelerates the loop between modeling and experimentation: it enables the inference of complex mechanisms, such as those controlling central metabolic operation. Availability: Toolbox ‘NearOED' available with source code under GPL on the Machine Learning Open Source Software Web site (mloss.org). Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin
    corecore